智能论文笔记

RDD2022: A multi-national image dataset for automatic Road Damage Detection

Deeksha Arya , Hiroya Maeda , Sanjay Kumar Ghosh , Durga Toshniwal , Yoshihide Sekimoto

分类：计算机视觉 | 人工智能 | 机器学习

2022-09-18

数据文章介绍了路线损坏数据集RDD2022，其中包括来自六个国家，日本，印度，捷克共和国，挪威，美国和中国的47,420条道路图像。图像已注释了超过55,000个道路损坏的实例。数据集中捕获了四种类型的道路损坏，即纵向裂缝，横向裂纹，鳄鱼裂纹和坑洼。设想注释的数据集用于开发基于深度学习的方法以自动检测和对道路损害进行分类。该数据集已作为基于人群传感的道路伤害检测挑战（CRDDC2022）的一部分发布。 CRDDC2022挑战邀请了来自全球的研究人员提出解决方案，以在多个国家 /地区自动道路损害检测。市政当局和道路机构可以使用RDD2022数据集，并使用RDD2022培训的模型用于低成本自动监测道路状况。此外，计算机视觉和机器学习研究人员可能会使用数据集对其他类型的其他基于图像的应用程序（分类，对象检测等）进行不同算法的性能。

translated by 谷歌翻译

Mapping smallholder cashew plantations to inform sustainable tree crop expansion in Benin

Leikun Yin , Rahul Ghosh , Chenxi Lin , David Hale , Christoph Weigl , James Obarowski , Junxiong Zhou , Jessica Till , Xiaowei Jia , Troy Mao

分类：计算机视觉 | 机器学习

2023-01-01

Cashews are grown by over 3 million smallholders in more than 40 countries worldwide as a principal source of income. As the third largest cashew producer in Africa, Benin has nearly 200,000 smallholder cashew growers contributing 15% of the country's national export earnings. However, a lack of information on where and how cashew trees grow across the country hinders decision-making that could support increased cashew production and poverty alleviation. By leveraging 2.4-m Planet Basemaps and 0.5-m aerial imagery, newly developed deep learning algorithms, and large-scale ground truth datasets, we successfully produced the first national map of cashew in Benin and characterized the expansion of cashew plantations between 2015 and 2021. In particular, we developed a SpatioTemporal Classification with Attention (STCA) model to map the distribution of cashew plantations, which can fully capture texture information from discriminative time steps during a growing season. We further developed a Clustering Augmented Self-supervised Temporal Classification (CASTC) model to distinguish high-density versus low-density cashew plantations by automatic feature extraction and optimized clustering. Results show that the STCA model has an overall accuracy of 80% and the CASTC model achieved an overall accuracy of 77.9%. We found that the cashew area in Benin has doubled from 2015 to 2021 with 60% of new plantation development coming from cropland or fallow land, while encroachment of cashew plantations into protected areas has increased by 70%. Only half of cashew plantations were high-density in 2021, suggesting high potential for intensification. Our study illustrates the power of combining high-resolution remote sensing imagery and state-of-the-art deep learning algorithms to better understand tree crops in the heterogeneous smallholder landscape.

translated by 谷歌翻译

An ensemble neural network approach to forecast Dengue outbreak based on climatic condition

Madhurima Panja , Tanujit Chakraborty , Sk Shahid Nadim , Indrajit Ghosh , Uttam Kumar , Nan Liu

分类：机器学习

2022-12-16

Dengue fever is a virulent disease spreading over 100 tropical and subtropical countries in Africa, the Americas, and Asia. This arboviral disease affects around 400 million people globally, severely distressing the healthcare systems. The unavailability of a specific drug and ready-to-use vaccine makes the situation worse. Hence, policymakers must rely on early warning systems to control intervention-related decisions. Forecasts routinely provide critical information for dangerous epidemic events. However, the available forecasting models (e.g., weather-driven mechanistic, statistical time series, and machine learning models) lack a clear understanding of different components to improve prediction accuracy and often provide unstable and unreliable forecasts. This study proposes an ensemble wavelet neural network with exogenous factor(s) (XEWNet) model that can produce reliable estimates for dengue outbreak prediction for three geographical regions, namely San Juan, Iquitos, and Ahmedabad. The proposed XEWNet model is flexible and can easily incorporate exogenous climate variable(s) confirmed by statistical causality tests in its scalable framework. The proposed model is an integrated approach that uses wavelet transformation into an ensemble neural network framework that helps in generating more reliable long-term forecasts. The proposed XEWNet allows complex non-linear relationships between the dengue incidence cases and rainfall; however, mathematically interpretable, fast in execution, and easily comprehensible. The proposal's competitiveness is measured using computational experiments based on various statistical metrics and several statistical comparison tests. In comparison with statistical, machine learning, and deep learning methods, our proposed XEWNet performs better in 75% of the cases for short-term and long-term forecasting of dengue incidence.

translated by 谷歌翻译

Two-stream Multi-dimensional Convolutional Network for Real-time Violence Detection

Dipon Kumar Ghosh , Amitabha Chakrabarty

分类：计算机视觉

2022-11-08

The increasing number of surveillance cameras and security concerns have made automatic violent activity detection from surveillance footage an active area for research. Modern deep learning methods have achieved good accuracy in violence detection and proved to be successful because of their applicability in intelligent surveillance systems. However, the models are computationally expensive and large in size because of their inefficient methods for feature extraction. This work presents a novel architecture for violence detection called Two-stream Multi-dimensional Convolutional Network (2s-MDCN), which uses RGB frames and optical flow to detect violence. Our proposed method extracts temporal and spatial information independently by 1D, 2D, and 3D convolutions. Despite combining multi-dimensional convolutional networks, our models are lightweight and efficient due to reduced channel capacity, yet they learn to extract meaningful spatial and temporal information. Additionally, combining RGB frames and optical flow yields 2.2% more accuracy than a single RGB stream. Regardless of having less complexity, our models obtained state-of-the-art accuracy of 89.7% on the largest violence detection benchmark dataset.

translated by 谷歌翻译

Evaluating Impact of Social Media Posts by Executives on Stock Prices

Anubhav Sarkar , Swagata Chakraborty , Sohom Ghosh , Sudip Kumar Naskar

分类：自然语言处理

2022-11-01

Predicting stock market movements has always been of great interest to investors and an active area of research. Research has proven that popularity of products is highly influenced by what people talk about. Social media like Twitter, Reddit have become hotspots of such influences. This paper investigates the impact of social media posts on close price prediction of stocks using Twitter and Reddit posts. Our objective is to integrate sentiment of social media data with historical stock data and study its effect on closing prices using time series models. We carried out rigorous experiments and deep analysis using multiple deep learning based models on different datasets to study the influence of posts by executives and general people on the close price. Experimental results on multiple stocks (Apple and Tesla) and decentralised currencies (Bitcoin and Ethereum) consistently show improvements in prediction on including social media data and greater improvements on including executive posts.

translated by 谷歌翻译

ParaColorizer: Realistic Image Colorization using Parallel Generative Networks

Himanshu Kumar , Abeer Banerjee , Sumeet Saurav , Sanjay Singh

分类：计算机视觉

2022-08-17

灰度图像着色是AI在信息恢复中的引人入胜的应用。该问题的天生性质不良的性质使其更具挑战性，因为输出可能是多模式的。目前正在使用的基于学习的方法为直接情况产生可接受的结果，但在没有明确的图形分离的情况下通常无法恢复上下文信息。同样，由于在完整图像特征上训练的单个模型不足以学习各种数据模式，因此图像遭受了颜色出血和饱和背景。为了解决这些问题，我们提出了一个基于GAN的配色框架。在我们的方法中，每个量身定制的GAN管道都会使前景（使用对象级特征）或背景（使用全图像功能）着色。前景管道采用了一个具有自我注意事项的残留无UNET作为其发电机，使用了全图像功能和可可数据集中的相应对象级特征训练。背景管道依赖于该位置数据集的全图像功能和其他培训示例。我们设计了一个基于密集的融合网络，以通过基于特征的融合来获得最终的有色图像。我们显示了通常用于评估多模式问题（例如图像着色）并使用多个感知指标对我们的框架进行广泛的绩效评估的非感知评估指标的缺点。我们的方法的表现优于大多数基于学习的方法，并且产生的结果与最新的方法相当。此外，我们进行了运行时分析，并获得了每个图像的平均推理时间24ms。

translated by 谷歌翻译

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Aarohi Srivastava , Abhinav Rastogi , Abhishek Rao , Abu Awal Md Shoeb , Abubakar Abid , Adam Fisch , Adam R. Brown , Adam Santoro , Aditya Gupta , Adrià Garriga-Alonso

分类：自然语言处理 | 人工智能 | 机器学习 | (统计)机器学习

2022-06-09

语言模型既展示了定量的改进，又展示了新的定性功能，随着规模的增加。尽管它们具有潜在的变革性影响，但这些新能力的特征却很差。为了为未来的研究提供信息，为破坏性的新模型能力做准备，并改善社会有害的效果，至关重要的是，我们必须了解目前和近乎未来的能力和语言模型的局限性。为了应对这一挑战，我们介绍了超越模仿游戏基准（Big Bench）。 Big Bench目前由204个任务组成，由132家机构的442位作者贡献。任务主题是多样的，从语言学，儿童发展，数学，常识性推理，生物学，物理学，社会偏见，软件开发等等。 Big-Bench专注于被认为超出当前语言模型的功能的任务。我们评估了OpenAI的GPT型号，Google内部密集变压器体系结构和大型基础上的开关稀疏变压器的行为，跨越了数百万到数十亿个参数。此外，一个人类专家评估者团队执行了所有任务，以提供强大的基准。研究结果包括：模型性能和校准都随规模改善，但绝对的术语（以及与评估者的性能相比）；在模型类中的性能非常相似，尽管带有稀疏性。逐渐和预测的任务通常涉及大量知识或记忆成分，而在临界规模上表现出“突破性”行为的任务通常涉及多个步骤或组成部分或脆性指标；社交偏见通常会随着含糊不清的环境而随着规模而增加，但这可以通过提示来改善。

translated by 谷歌翻译

Impact of the composition of feature extraction and class sampling in medicare fraud detection

Akrity Kumari , Narinder Singh Punn , Sanjay Kumar Sonbhadra , Sonali Agarwal

分类：机器学习

2022-06-03

由于医疗保健是关键方面，健康保险已成为最大程度地减少医疗费用的重要计划。此后，由于保险的增加，医疗保健行业的欺诈活动大幅增加，欺诈行业已成为医疗费用上升的重要贡献者，尽管可以使用欺诈检测技术来减轻其影响。为了检测欺诈，使用机器学习技术。美国联邦政府的医疗补助和医疗保险服务中心（CMS）在本研究中使用“医疗保险D部分”保险索赔来开发欺诈检测系统。在类不平衡且高维的Medicare数据集中使用机器学习算法是一项艰巨的任务。为了紧凑此类挑战，目前的工作旨在在数据采样之后执行功能提取，然后应用各种分类算法，以获得更好的性能。特征提取是一种降低降低方法，该方法将属性转换为实际属性的线性或非线性组合，生成较小，更多样化的属性集，从而降低了尺寸。数据采样通常用于通过扩大少数族裔类的频率或降低多数类的频率以获得两种类别的出现数量大约相等的频率来解决类不平衡。通过标准性能指标评估所提出的方法。因此，为了有效地检测欺诈，本研究将自动编码器作为特征提取技术，合成少数族裔过采样技术（SMOTE）作为数据采样技术，以及各种基于决策树的分类器作为分类算法。实验结果表明，自动编码器的结合，然后在LightGBM分类器上获得SMOTE，取得了最佳的结果。

translated by 谷歌翻译

A study on native American English speech recognition by Indian listeners with varying word familiarity level

Abhayjeet Singh , Achuth Rao MV , Rakesh Vaideeswaran , Chiranjeevi Yarra , Prasanta Kumar Ghosh

分类：自然语言处理

2021-12-08

在这项研究中，要求各种印度生物的听众倾听并认识到美国扬声器所说的速度话语。我们识别出一个话语时，我们有三种来自每个听众的回应：1。句子难度评级，2.扬声器难度评级，以及讲话的转录。从这些转录中，计算并用作标准以评估识别和原始句子之间的相似性。本研究中选择的句子分为三组：简单，中和硬，基于此研究它们中的单词的频率。我们观察到句子，扬声器难度评级和行动从易于难以句子的句子增加。我们还使用以下三种自动语音识别（ASR）进行人类语音识别性能，在声学模型（AM）和语言模型（LM）（LM）（LM）：ASR1）训练中，录制了印度源头和LM的录音Timit Text，ASR2）我正在使用来自Libli语音语料库的本地美国扬声器和LM的录音，以及ASR3）正在使用来自美国原住民扬声器和LM构建的录音在Libli语音和Timit文本上。我们观察到HSR性能类似于ASR1的性能，而ASR3则实现最佳性能。扬声器诞生明智的分析表明，与少数其他生命神相比，印度听众的扬声器的话语更难以识别

translated by 谷歌翻译

Semantic Segmentation of Legal Documents via Rhetorical Roles

Vijit Malik , Rishabh Sanjay , Shouvik Kumar Guha , Shubham Kumar Nigam , Angshuman Hazarika , Arnab Bhattacharya , Ashutosh Modi

分类：自然语言处理 | 人工智能 | 机器学习

2021-12-03

法律文件是非结构化的，使用法律术语，并且具有相当长的长度，使得难以通过传统文本处理技术自动处理。如果文档可以在语义上分割成连贯的信息单位，法律文件处理系统将基本上受益。本文提出了一种修辞职位（RR）系统，用于将法律文件分组成语义连贯的单位：事实，论点，法规，问题，先例，裁决和比例。在法律专家的帮助下，我们提出了一套13个细粒度的修辞标志标签，并创建了与拟议的RR批发的新的法律文件有条件。我们开发一个系统，以将文件分段为修辞职位单位。特别是，我们开发了一种基于多任务学习的深度学习模型，文档修辞角色标签作为分割法律文件的辅助任务。我们在广泛地尝试各种深度学习模型，用于预测文档中的修辞角色，并且所提出的模型对现有模型显示出卓越的性能。此外，我们应用RR以预测法律案件的判断，并表明与基于变压器的模型相比，使用RR增强了预测。

translated by 谷歌翻译